Predictive ability of hypotension prediction index and machine learning methods in intraoperative hypotension: a systematic review and meta-analysis

Introduction Intraoperative Hypotension (IOH) poses a substantial risk during surgical procedures. The integration of Artificial Intelligence (AI) in predicting IOH holds promise for enhancing detection capabilities, providing an opportunity to improve patient outcomes. This systematic review and meta analysis explores the intersection of AI and IOH prediction, addressing the crucial need for effective monitoring in surgical settings. Method A search of Pubmed, Scopus, Web of Science, and Embase was conducted. Screening involved two-phase assessments by independent reviewers, ensuring adherence to predefined PICOS criteria. Included studies focused on AI models predicting IOH in any type of surgery. Due to the high number of studies evaluating the hypotension prediction index (HPI), we conducted two sets of meta-analyses: one involving the HPI studies and one including non-HPI studies. In the HPI studies the following outcomes were analyzed: cumulative duration of IOH per patient, time weighted average of mean arterial pressure < 65 (TWA-MAP < 65), area under the threshold of mean arterial pressure (AUT-MAP), and area under the receiver operating characteristics curve (AUROC). In the non-HPI studies, we examined the pooled AUROC of all AI models other than HPI. Results 43 studies were included in this review. Studies showed significant reduction in IOH duration, TWA-MAP < 65 mmHg, and AUT-MAP < 65 mmHg in groups where HPI was used. AUROC for HPI algorithms demonstrated strong predictive performance (AUROC = 0.89, 95CI). Non-HPI models had a pooled AUROC of 0.79 (95CI: 0.74, 0.83). Conclusion HPI demonstrated excellent ability to predict hypotensive episodes and hence reduce the duration of hypotension. Other AI models, particularly those based on deep learning methods, also indicated a great ability to predict IOH, while their capacity to reduce IOH-related indices such as duration remains unclear. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-024-05481-4.


Introduction
Every year more than 300 million surgeries are conducted worldwide [1], resulting in a significant number of patients experiencing possible intraoperative complications.One of the most common complications associated with both cardiac and non-cardiac surgeries is intraoperative hypotension (IOH) [2].In non-cardiac surgeries, IOH has been associated with myocardial injury [3], acute kidney injury [4], and death [5].Additionally, in cardiac surgeries hypotension during cardiopulmonary bypass (CPB) has been associated with a significant risk of ischemic stroke, with the risk increasing as more time is spent in a hypotensive state.As a consequence, it is imperative that the duration of IOH be minimized during surgeries.The current model of managing IOH is mostly reactive, and the treatment often occurs with delay [6].However, the perilous ramifications of IOH have recently pushed researchers towards more proactive approaches to its treatment.
Even though efforts have been made to identify the epidemiological factors predisposing patients to IOH in order to estimate its risk during surgeries [7,8], they are not helpful in reducing IOH in clinical settings [9].However, AI models have demonstrated high efficiency in predicting IOH in real-time and providing the clinician with enough time to act before the onset of a hypotensive episode.Furthermore, AI model have proven useful in predicting and preventing hypotension-induced complications, including sepsis [10] and acute kidney injury [11].These real-time IOH prediction models have been proven so valuable that one logistic regression (LR) model in particular_ the Hypotension Prediction Index (HPI) [12]_ is now commercially available for use, and has prompted many clinical trials and validation studies testing its efficiency in predicting IOH.
As studies assessing the efficiency of AI models including HPI have been accumulating in the literature, we set out to compile the existing evidence regarding the efficacy of such models and HPI by area under the receiver operating curve (AUROC).This review aims to examine the current state of AI research in IOH prediction.Further, it aims to quantify and compare the efficiency and predictive ability of both HPI and non-HPI models.

Methods
This systematic review and meta-analysis was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [13] in search of papers developing or validating artificial intelligence methods for the prediction of intraoperative hypotension.The protocol for this review was prospectively registered on PROSPERO (CRD42024504636).Due to the high number of studies evaluating the hypotension prediction index (HPI), those studies were extracted and analyzed separately.

Search strategy
An online database search of Pubmed, Scopus, Embase, and Web of Science was conducted on December 9th, 2023 with MeSH terms and keywords synonymous with "artificial intelligence" and "intraoperative hypotension".No publication date or language limitations were defined.A manual citation search of the included studies was also performed after finalization of the screening process.

Eligibility criteria
For the purposes of our study IOH was defined as hypotension occurring after the induction of anesthesia, regardless of the specific blood pressure cut-off that was utilized (e.g.MAP < 65 mm Hg, or MAP < 55 mm Hg) or timeframe during which IOH was detected by different studies (e.g. from tracheal intubation to incision, after incision, etc.).
Studies were included if they complied with the following PICOS: Population: patients undergoing cardiac or non-cardiac surgeries with intraoperative blood pressure monitoring.
Intervention: predictive models utilizing artificial intelligence which are either being developed or being validated.
Comparator: standard intraoperative care or non-AI models, if applicable.
Outcomes: area under the receiver operating characteristics curve for prediction of intraoperative hypotension as defined by the study using AI models that have been developed or are being validated.Duration of hypotension as defined by the study or time-weighted average of hypotension (TWA-MAP < 65 mmHg) or area under the threshold for hypotension (AUT-MAP < 65 mmHg) for AI models under validation.
Study design: AI development papers or controlled studies validating AI models.
Papers were excluded if they did not include human participants, only used AI models for feature selection, or if they were conference abstracts.

Study selection
Studies were screened by two independent reviewers in two phases.An initial title abstract screening followed by full-text retrieval and evaluation.Discrepancies between the two were resolved by an independent third reviewer through discussion.

Data extraction
The following data were extracted into one of two preconstructed Excel spreadsheets by two independent reviewers under the supervision of a third reviewer.The first spreadsheet contained the following data regarding studies not utilizing the hypotensive prediction index: author, year of publication, country, type of surgery (cardiac or non-cardiac), type of the source of dataset, population size, population age, population gender, predicted outcome, number of variables used in the model, mode of validation, best-performing algorithm, number of hypotensive events, number of hypotensive patients, and area under the receiver operating curve (AUROC).
The second spreadsheet included the following data regarding studies employing the hypotension prediction index: author, year of publication, country, type of study, type of surgery, type of comparator, group size, age and gender of the groups, number of intraoperative hypotensive events, number of hypotensive patients, duration of intraoperative hypotension, time-weighted average of hypotension (MAP < 65 mmHg), area under the threshold of MAP < 65 mmHg (mmHg*min), and AUROC for the hypotensive prediction index.

Outcomes
The following were chosen as the main outcomes of this review: AUROC for HPI and non-HPI studies and TWA-MAP < 65, duration of intraoperative hypotension, and AUT-MAP < 65 for HPI studies only.

Quality assessment
Quality assessment of non-HPI studies was conducted using the PROBAST tool [14], while quality assessment of studies utilizing HPI was done using the Jadad scale [15] or Newcastle-Ottowa scale [16] for controlled trials and observational studies respectively.

Statistical analysis
Meta-analysis of the duration of hypotension, TWA-MAP < 65, and AUT-MAP < 65 of the HPI studies was conducted using the Comprehensive Meta-Analysis software (CMA, version 3, NJ, USA) utilizing a random effects model with standardized mean difference (SMD) and its corresponding 95% confidence interval (CI) as the effect size.Means and standard deviations were the only accepted data entry form and medians with interquartile ranges were converted to means and standard deviations using methods outlined by Wan et al. [17] and Luo et al. [18].
Meta-analysis of the AUROC for both the HPI and non-HPI studies was done using Stata version 18 (Stata-Corp.2023.Stata Statistical Software; College Station, TX, USA), employing a random effects model with the restricted maximum-likelihood method.AUROC of the HPI studies was sub-group meta-analyzed by the time before prediction of intraoperative hypotension, while the AUROC of the non-HPI studies was sub-group analyzed by algorithm type, and the definition of IOH used to construct the models.In addition, subgroup analyses were conduceted for all of the primary outcomes in both types of studies according to the quality of the studies.
Heterogeneity was assessed using the I 2 statistic, with an I 2 > 50% signifying substantial heterogeneity [19].A p-value of < 0.05 was considered statistically significant.Sensitivity analysis was conducted using the leave-oneout method and publication bias was assessed using Egger's regression test (p-value < 0.05) and funnel plot symmetry if at least 10 studies were included in the meta-analysis.

Results
The initial search yielded 1705 records, with 997 studies eligible for screening after duplicate removal.A total of 67 studies were then chosen for full-text evaluation and 43 of them were included in our review.22 of the included studies were HPI studies, while 21 studies developed non-HPI models to predict IOH (Fig. 1).

AUROC
7 of the HPI studies assessed the performance of this algorithm in the prediction of IOH 5 to 15 min before the event [12, 22-24, 26, 29, 32].After pooling the studies, we found the HPI algorithm has a considerable AUROC of 0.89 (95CI: 0.88, 0.92) between 5 and 15 min prior to the event.When stratified by time to event prediction, the algorithm performs insignificantly better 5 min before the event (AUROC: 0.92; 95CI: 0.88-0.95) in comparison to 10 min (AUROC: 0.88; 95CI: 0.82-0.93)and 15 min prior to IOH (AUROC: 0.86; 95CI: 0.80-0.93;Fig. 5).Significantly high heterogeneity is present in this metaanalysis (I 2 = 99%) and inside each subgroup (I 2 = 99%).Sensitivity analysis shows our findings to be stable (data not shown) and publication bias was not assessed due to less than 10 studies included in the meta-analysis.

AUROC
10 of the 21 non-HPI studies [9,41,44,45,48,49,[52][53][54]60] had reported the AUROC of their AI models and were subgroup meta-analyzed by the algorithm employed in their models.Our meta-analysis found an AUROC of 0.79 (95CI: 0.74, 0.83) when all algorithms were pooled together, with substantial heterogeneity (I 2 = 99%; Fig. 6).Sensitivity analysis showed our findings to be stable (data not shown) yet significant publication bias was present (Egger's test p-value < 0.01; Additional file 1: Figure S6).In addition, subgroup analyses based on the definition of hypotension revealed that significantly higher AUROC values are achieved when IOH is defined as MAP < 65, while other definitions did not differe significantly in comparison to one another.Additionally high heterogeneity was present in each subgroup (Additional file 2: Figure S7).

Quality assessment
Among the HPI studies, there were 13 cohort studies, 4 of which were judged to be of high quality, and the remaining 9 were assessed to be of unclear quality due to the absence of control groups.Out of the 8 randomized controlled trials, 4 had excellent quality according to the Jadad scale while there were some concerns about 3 studies and one had low quality.The HPI development study was also of unclear quality due to lack of information about the analyses conducted to develop the model (Fig. 7).Subgroup analyses of the HPI studies revealed that none of the measured outcomes _namely 5-, 10-, and 15-minute AUROCs, TWA-MAP < 65, duration of intraoperative hypotension, and AUT-MAP < 65_ were significantly different among studies with different qualities (Additional file 2: Figures S8-S13).Among the non-HPI studies, 9 were judged to be high quality, 10 were of unclear quality and 2 had low quality.The main cause of unclear quality among the studies was the unclear status they received in the analysis domain of the PROBAST tool (Fig. 7).Subgroup analyses revealed that high-quality studies obtained significantly lower AUROCs compared to those of unclear quality (Additional file 2: Figure S14).

Discussion
Intraoperative hypotension is a common phenomenon and has been associated with major postoperative complications such as major adverse renal [62], neurological [63], and cardiac [64] events and a high rate of mortality [65].Despite these remarkable negative sequelae, the current management of IOH is predominantly reactive and often results in losing valuable time and higher IOH periods [6,35,49].To remedy this issue, proactive approaches such as prediction models_ specifically those employing AI_ have recently been developed and tested in the relevant literature.Our study is the first systematic review and meta-analysis of the performance of AI models for the prediction of IOH.We found that one specific Fig. 4 Forest plot for the meta-analysis of hypotension prediction index studies comparing the are under the threshold for hypotension (MAP < 65 mmHg) between hypotension prediction index guided participants and participants receiving standard in-house protocols.a [33].b [27] Fig. 5 Meta-analysis of the AUROC of the hypotension prediction index at 5, 10, and 15 minutes before the occurrence of intraoperative hypotension.The overall AUROC of the hypotension prediction index for predicting intraoperative hypotension is presented at the bottom of the figure.a [25] AI model, HPI, has been studied in 21 RCTs and cohort studies that externally validated it, and hence decided to evaluate it separately.In total, HPI was shown to significantly outperform the in-house protocol for predicting hypotension by reducing the cumulative duration of IOH per patient, TWA-MAP < 65, and AUT-MAP < 65.Furthermore, the pooled AUROC of HPI among the studies was 0.89, with the model demonstrating a statistically similar performance 5, 10, and 15 min before each hypotension event.In addition, 22 studies reported on the development of AI models other than HPI.Collectively, the non-HPI models achieved an AUROC of 0.79, with models incorporating recurrent, convolutional, and deep neural networks performing the best, and SVM models performing the worst.

Hypotension prediction index
Regarding HPI, the results of our analyses are in accordance with a previous systematic review of RCTs evaluating the performance of HPI during non-cardiac surgeries.Li et al. included 5 studies, representing 461 patients, and ultimately found that the median differences of medians indicated an improvement in hypotension-related endpoints such as duration, incidence, percentage, TWA-MAP < 65, and AUT-MAP < 65 [66].In contrast, our study was not limited to RCTs in order to gather the limited evidence in this novel subject matter more effectively, and as a result, we were also able to meta-analyze the AUROCs obtained by external validation studies of HPI, demonstrating its excellent predictive ability, particularly compared to other LR-based models.In addition, while HPI was originally designed based on invasive arterial line waveforms, many of the included reports have utilized it in combination with non-invasive arterial pressure waveforms and achieved comparable results [22,23,27,28], expanding its applicability to potentially any patient undergoing surgery.It is also noteworthy that while the original HPI model was developed to predict IOH in noncardiac surgeries, Shin et al. conducted a cohort study testing the feasibility of its use in patients undergoing cardiac surgery requiring cardiopulmonary bypass, and concluded that HPI predicted IOH with a high degree of sensitivity and specificity [32].As these results are promising, we recommend further research be conducted to examine HPI's efficacy in cardiac surgeries.

Non-HPI AI models
With respect to the non-HPI studies, we observed an excellent prediction ability among the deep learning algorithms, such as RNN, DNN, and CNN, with the highest AUROC achieved by STEP-OP [49], a model incorporating CNN and RNN (AUROC = 0.96).In general, deep learning has shown the potential to achieve more precise predictions than traditional ML in many areas of medical research [67], as it employs layers of neurons_as opposed to manual feature extraction in traditional ML_ and thus detects more abstract and generalized connections in the data [68].As a result, it has been speculated that deep-learning algorithms may be able to detect subtle heralding changes in the arterial waveform which could be overlooked when represented as features in traditional ML models such as HPI [49].Overall, although our meta-analysis demonstrated that HPI is very effective and indeed a formidable opponent for other prediction models, the model has been developed using LR [12], and non-HPI deep learning models hold the promise of even more accurate projections in the future.
In addition, our subgroup analyses revealed that studies employing a cut-off of MAP < 65 mm Hg for the definition of IOH performed significantly better in comparison to reports using other definitions.This finding is consistent with previous findings, as HPI also uses this definition of IOH [12] and has achieved excellent results by doing so, as evident by our analyses.Consequently, it seems that MAP < 65 mm Hg can be deemed as the best definition for IOH for the design of future IOH prediction models.

Limitations
Our study has several limitations.While the analyses of the HPI studies were not heterogeneous, a high degree of heterogeneity was observed among the analyses of non-HPI models, and could not be explained with subgroup analyses based on AI model types or the definitions of IOH.We believe this heterogeneity can be due to the diverse nature of surgeries included, population makeup variations, and _in the cases of traditional ML models_differences in feature extraction.Second, while we were able to perform qualitative synthesis on 10 of the non-HPI studies, 12 of them could not be included due to either not reporting their AUROC and/or its confidence interval.In addition, many of the non-HPI studies were not of high quality, mainly due to underreporting in the analyses they performed to develop the model.We suggest that moving forward, all AI development studies meticulously mention the steps they take to develop the model, and also report AUROC in order to be eligible for inclusion in quantitative syntheses of meta-analysis studies.Further, to aid the design of future studies, it is advisable that features with high predictive values be discussed by the authors more extensively.Finally, while the results of our analyses regarding HPI are promising, the number of patients included in the studies is low and the study designs are far from robust.Additionally, Li et al. [66] found, the grade of evidence regarding the associated outcomes is poor.We recommend more RCTs with larger sample sizes and higher quality be conducted to remedy this issue.

Conclusion
Artificial intelligence prediction models of intraoperative hypotension hold the potential to fundamentally change the way we treat this perilous condition.HPI, the first commercially available AI prediction model of IOH, demonstrated an excellent ability to predict hypotensive episodes and hence reduce the duration of hypotension.Other AI models, particularly those based on deep learning methods also demonstrated a great ability to predict IOH, while their capacity to reduce IOH-related indices such as duration remains unclear.This systematic review provides a comprehensive overlook of these models, serving as a stepping stone for future studies that may be conducted in this field.

Fig. 3
Fig.3Forest plot for the meta-analysis of hypotension prediction index studies comparing the time-weighted average of hypotension (MAP < 65 mmHg) between hypotension prediction index guided participants and participants receiving standard in-house protocols.a[33].b[27]

Table 2
Characteristics of the non-HPI AI development studies